<p>
  In finance and economics filed, most of the models are linear ones. We can see linear regression everywhere, from the foundation of the model portfolio theory to the nowadays popular Fama-French asset pricing model. It's very important to understand how linear regression works in order to have a comprehensive understanding of those theories.
</p>
<p>
  If we are holding a stock, we must be curious about the relationship between our stock return and the market return. Let's say we hold Amazon stock on the first day of this year. In order to see the relation directly, we plot the daily return of our stock on the y-axis and plot the S&amp;P 500 index daily return on the x-axis.
</p>
<div class="section-example-container">

<pre class="python">import numpy as np
import pandas as pd
import quandl
quandl.ApiConfig.api_key = '_fgkxjSbt5389zGt4crC'
#get data from quandl
spy_table = quandl.get('BCIW/_SPXT')
amzn_table = quandl.get('WIKI/AMZN')
#fetch data from Jan 2017 to Jun 2017
spy = spy_table.loc['2017':'2017-6',['Close']]
amzn = amzn_table.loc['2017':'2017-6',['Close']]
#calculate log return
spy_log = np.log(spy.Close).diff().dropna()
amzn_log = np.log(amzn.Close).diff().dropna()
df = pd.concat([spy_log,amzn_log],axis = 1).dropna()
df.columns = ['spx','amzn']
print df.tail()
</pre>
</div>

<table cellspacing="0" cellpadding="3" class="table qc-table">
    <tr>
        <td>&nbsp;</td>
        <td align="center">spy</td>
        <td align="center">amzn</td>
    </tr>
    <tr>
        <td>2016-12-22</td>
        <td align="center">-0.004462</td>
        <td align="center">-0.005543</td>
    </tr>
    <tr>
        <td>2016-12-23</td>
        <td align="center">0.001372</td>
        <td align="center">-0.007531</td>
    </tr>
    <tr>
        <td>2016-12-23</td>
        <td align="center">0.000928</td>
        <td align="center">0.000946</td>
    </tr>
    <tr>
        <td>2016-12-23</td>
        <td align="center">-0.005671</td>
        <td align="center">-0.009081</td>
    </tr>
    <tr>
        <td>2016-12-23</td>
        <td align="center">0.002086</td>
        <td align="center">-0.020172</td>
    </tr>

</table>
<p>
  We successfully create a DataFrame contains the daily logarithm return of Amazon stock and S&amp;P500. Now let's plot it:
</p>

<div class="section-example-container">

<pre class="python">import matplotlib.pyplot as plt
plt.figure(figsize = (15,10))
plt.scatter(df.spy,df.amzn)
plt.show()
</pre>
</div>

<img class="img-responsive" src="https://cdn.quantconnect.com/tutorials/i/Tutorial09-plot1.png" alt="plot1" />
<p>
  The plot is scattered, but we can see they are approximately correlated: generally the higher SPX's daily return is, the higher Amazon stock's return is. This is called <strong>positively correlated</strong>. We will cover it in the following tutorials.
</p>
